Deep neural network (DNN)-based approaches have been shown to be effective in many automatic speech\nrecognition systems. However, few works have focused on DNNs for distant-talking speaker recognition. In this study,\na bottleneck feature derived from a DNN and a cepstral domain denoising autoencoder (DAE)-based dereverberation\nare presented for distant-talking speaker identification, and a combination of these two approaches is proposed. For\nthe DNN-based bottleneck feature, we noted that DNNs can transform the reverberant speech feature to a new\nfeature space with greater discriminative classification ability for distant-talking speaker recognition. Conversely,\ncepstral domain DAE-based dereverberation tries to suppress the reverberation by mapping the cepstrum of\nreverberant speech to that of clean speech with the expectation of improving the performance of distant-talking\nspeaker recognition. Since the DNN-based discriminant bottleneck feature and DAE-based dereverberation have a\nstrong complementary nature, the combination of these two methods is expected to be very effective for\ndistant-talking speaker identification. A speaker identification experiment was performed on a distant-talking speech\nset, with reverberant environments differing from the training environments. In suppressing late reverberation, our\nmethod outperformed some state-of-the-art dereverberation approaches such as the multichannel least mean\nsquares (MCLMS). Compared with the MCLMS, we obtained a reduction in relative error rates of 21.4% for the\nbottleneck feature and 47.0% for the autoencoder feature. Moreover, the combination of likelihoods of the\nDNN-based bottleneck feature and DAE-based dereverberation further improved the performance.
Loading....